Improved Phone Posterior Estimation through K-nn and Mlp-based Similarity
نویسندگان
چکیده
I would like to thank Professor Hervé Bourlard, my master's thesis supervisor, who allowed me to do it at the Idiap Research Institute and who gave me a very interesting work subject: Improved Phone Posterior Estimation Through k-NN And MLP-based Similarity. I thank him, and also Dr Mathew Magimai Doss and Mrs Afsaneh Asaei for their kindness, their guidance, their availability and their support all along this thesis. I am also thankful to Professor Thierry Dutoit, my supervisor at Faculté Polytechnique de Mons in Belgium, for giving me the opportunity to realize my master's thesis within the international scientific and cultural context of the Idiap Research Institute. I would like to thank also particularly Mrs Nadine Rousseau and Mrs Sylvie Millius for their kindness, their advices and their helps in all the administrative procedures and in the research of an apartment in Martigny. Finally, I would like to thank all the people working in Idiap Research Institute for their warm welcome, when I came there for the first time and throughout my master's thesis, and This work was performed in the context of a European ERASMUS exchange program, complemented by the AMIDA training program. AMIDA (Augmented Multi-party Interaction with Distance Access) is a European Union Integrated Project, contract number IST-033812. The authors gratefully thank the EU for their financial support, and all project partners for a fruitful collaboration. More information about AMIDA is available from the project web site www.amiproject.org.Abstract In this work, we investigate the possible use of k-nearest neighbour (kNN) classifiers to perform frame-based acoustic phonetic classification, hence replacing Gaussian Mixture Models (GMM) or MultiLayer Perceptrons (MLP) used in standard Hidden Markov Models (HMMs). The driving motivation behind this idea is the fact that kNN is known to be an "optimal" classifier if a very large amount of training data is available (replacing the training of functional parameters by plain memorization of the training examples) and the correct distance metric is found. Nowadays, amount of training data is no longer an issue. In the current work, we thus specifically focused on the "correct" distance metric, mainly using an MLP to compute the probability that two input feature vectors are part of the same phonetic class or not. This MLP output can thus be used as a distance metric for kNN. While providing a "universal" distance metric, this work also enabled us to consider the speech recognition problem under …
منابع مشابه
In-context phone posteriors as complementary features for tandem ASR
In this paper, we present a method for integrating possible prior knowledge (such as phonetic and lexical knowledge), as well as acoustic context (e.g., the whole utterance) in the phone posterior estimation, and we propose to use the obtained posteriors as complementary posterior features in Tandem ASR configuration. These posteriors are estimated based on HMM state posterior probability defin...
متن کاملA TS Fuzzy Model Derived from a Typical Multi-Layer Perceptron
In this paper, we introduce a Takagi-Sugeno (TS) fuzzy model which is derived from a typical Multi-Layer Perceptron Neural Network (MLP NN). At first, it is shown that the considered MLP NN can be interpreted as a variety of TS fuzzy model. It is discussed that the utilized Membership Function (MF) in such TS fuzzy model, despite its flexible structure, has some major restrictions. After modify...
متن کاملHigh performance automatic mispronunciation detection method based on neural network and TRAP features
In this paper, we propose a new approach to utilize temporal information and neural network (NN) to improve the performance of automatic mispronunciation detection (AMD). Firstly, the alignment results between speech signals and corresponding phoneme sequences are obtained within the classic GMM-HMM framework. Then, the long-time TempoRAl Patterns (TRAPs) [5] features are introduced to describe...
متن کاملA learning approach in link adaptation for MIMO-OFDM systems
We propose a neural network (NN)-based adaptive modulation and coding (AMC) for link adaptation in MIMO-OFDM systems. The AMC optimizes the best modulation and coding scheme (MCS) under a packet error rate (PER) constraint. In our approach, a NN with a multilayer perceptron (MLP) structure is applied for the AMC and its performance is compared with the k-nearest neighbor (k-NN) algorithm under ...
متن کاملA novel method for detecting structural damage based on data-driven and similarity-based techniques under environmental and operational changes
The applications of time series modeling and statistical similarity methods to structural health monitoring (SHM) provide promising and capable approaches to structural damage detection. The main aim of this article is to propose an efficient univariate similarity method named as Kullback similarity (KS) for identifying the location of damage and estimating the level of damage severity. An impr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008